Computer Science Carnegie Mellon DISTRIBUTION

نویسندگان

  • Bryan Singer
  • Manuela Veloso
چکیده

When given several problems to solve in some domain, a standard reinforcement learner learns an optimal policy from scratch for each problem. If the domain has particular characteristics that are goal and problem independent, the learner might be able to take advantage of previously solved problems. Unfortunately, it is generally infeasible to directly apply a learned policy to new problems. This paper presents a method to bias exploration through previous problem solutions, which is shown to speed up learning on new problems. We first allow a Q-learner to learn the optimal policies for several problems. We describe each state in terms of local features, assuming that these state features together with the learned policies can be used to abstract out the domain characteristics from the specific layout of states and rewards in a particular problem. We then use a classifier to learn this abstraction by using training examples extracted from each learned Q-table. The trained classifier maps state features to the potentially goal-independent successful actions in the domain. Given a new problem, we include the output of the classifier as an exploration bias to improve the rate of convergence of the reinforcement learner. We have validated our approach empirically. In this paper, we report results within the complex domain Sokoban which we introduce.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Soar as a Unified Theory of Cognition: Spring 1990

Richard L. Lewis School of Computer Science, Carnegie Mellon University Scott B. Huffman Department of Electrical Engineering and Computer Science, University of Michigan Bonnie E. John School of Computer Science, Carnegie Mellon University John E. Laird Department of Electrical Engineering and Computer Science, University of Michigan Jill Fain Lehman School of Computer Science, Carnegie Mellon...

متن کامل

An Algorithm to Compute the Stochastically Stable Distribution of a Perturbed Markov Matrix

of “An Algorithm to Compute the Stochastically Stable Distribution of a Perturbed Markov Matrix” by John R. Wicks, Ph.D., Brown University, August 2008. Recently, some researchers have attempted to exploit state-aggregation techniques to compute stable distributions of high-dimensional Markov matrices (Gambin and Pokarowski, 2001). While these researchers have devised an efficient, recursive al...

متن کامل

A Principled Massive-Graph Similarity Function with Attribution

Danai Koutra, Computer Science and Engineering, University of Michigan, Ann Arbor1 Neil Shah, Computer Science Department, Carnegie Mellon University. Joshua T. Vogelstein, Department of Biomedical Engineering & Institute of Computational Medicine, Johns Hopkins University Child Mind Institute. Brian Gallagher, Lawrence Livermore National Laboratory. Christos Faloutsos, Computer Science Departm...

متن کامل

Structure and magnetic properties of L10-FePt thin films on TiN/RuAl underlayers

underlayers En Yang, Sutatch Ratanaphan, Jian-Gang Zhu, and David E. Laughlin Data Storage Systems Center, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA ABB Professor of Engineering Department of Electrical and Computer Engineering, Data Storage Systems Center, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 USA ALCOA Professor of Physical Metallurgy Materials Scien...

متن کامل

CMurfs: Carnegie Mellon United Robots for Soccer

In RoboCup’2009 SPL, Carnegie Mellon University participated as CMWrEagle, a joint team between the University of Science and Technology of China, led by Professor Xiaoping Chen, and Carnegie Mellon University, led by Professor Manuela Veloso. For RoboCup’2010, Carnegie Mellon University will be participating as a sole team: CMurfs Carnegie Mellon United Robots for Soccer. We are investigating ...

متن کامل

Abstraction and Counterexample-Guided Refinement in Model Checking of Hybrid Systems

ion and Counterexample-Guided Refinement in Model Checking of Hybrid Systems∗ Edmund Clarke, Ansgar Fehnker, Zhi Han, Bruce Krogh, Joël Ouaknine, Olaf Stursberg, Michael Theobald 1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2 Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA 3 Process Control Lab, University of Dor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999